Search CORE

17 research outputs found

Baryonic and mesonic 3-point functions with open spin indices

Author: Bali Gunnar S.
Collins Sara
Glässle Benjamin
Heybrock Simon
Korcyl Piotr
Löffler Marius
Rödl Rudolf
Schäfer Andreas
Publication venue: 'EDP Sciences'
Publication date: 07/11/2017
Field of study

We have implemented a new way of computing three-point correlation functions. It is based on a factorization of the entire correlation function into two parts which are evaluated with open spin- (and to some extent flavor-) indices. This allows us to estimate the two contributions simultaneously for many different initial and final states and momenta, with little computational overhead. We explain this factorization as well as its efficient implementation in a new library which has been written to provide the necessary functionality on modern parallel architectures and on CPUs, including Intel's Xeon Phi series.Comment: 7 pages, 5 figures, Proceedings of Lattice 201

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Jagiellonian Univeristy Repository

Lattice QCD with Domain Decomposition on Intel Xeon Phi Co-Processors

Author: Dubey Pradeep
Heybrock Simon
Joó Bálint
Kalamkar Dhiraj D.
Smelyanskiy Mikhail
Vaidyanathan Karthikeyan
Wettig Tilo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/12/2014
Field of study

The gap between the cost of moving data and the cost of computing continues to grow, making it ever harder to design iterative solvers on extreme-scale architectures. This problem can be alleviated by alternative algorithms that reduce the amount of data movement. We investigate this in the context of Lattice Quantum Chromodynamics and implement such an alternative solver algorithm, based on domain decomposition, on Intel Xeon Phi co-processor (KNC) clusters. We demonstrate close-to-linear on-chip scaling to all 60 cores of the KNC. With a mix of single- and half-precision the domain-decomposition method sustains 400-500 Gflop/s per chip. Compared to an optimized KNC implementation of a standard solver [1], our full multi-node domain-decomposition solver strong-scales to more nodes and reduces the time-to-solution by a factor of 5.Comment: 12 pages, 7 figures, presented at Supercomputing 2014, November 16-21, 2014, New Orleans, Louisiana, USA, speaker Simon Heybrock; SC '14 Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pages 69-80, IEEE Press Piscataway, NJ, USA (c)201

arXiv.org e-Print Archive

Crossref

A nested Krylov subspace method to compute the sign function of large complex matrices

Author: Bloch
Bloch
Bloch
Bloch
Bloch
Chiu
Druskin
Edwards
Gallopoulos
Golub
Hasenfratz
Jacques C.R. Bloch
Kennedy
Knizhnerman
Narayanan
Narayanan
Neuberger
Neuberger
Roberts
Saad
Simon Heybrock
van den Eshof
Zolotarev
Publication venue: 'Elsevier BV'
Publication date: 01/01/2011
Field of study

We present an acceleration of the well-established Krylov-Ritz methods to compute the sign function of large complex matrices, as needed in lattice QCD simulations involving the overlap Dirac operator at both zero and nonzero baryon density. Krylov-Ritz methods approximate the sign function using a projection on a Krylov subspace. To achieve a high accuracy this subspace must be taken quite large, which makes the method too costly. The new idea is to make a further projection on an even smaller, nested Krylov subspace. If additionally an intermediate preconditioning step is applied, this projection can be performed without affecting the accuracy of the approximation, and a substantial gain in efficiency is achieved for both Hermitian and non-Hermitian matrices. The numerical efficiency of the method is demonstrated on lattice configurations of sizes ranging from 4^4 to 10^4, and the new results are compared with those obtained with rational approximation methods.Comment: 17 pages, 12 figures, minor corrections, extended analysis of the preconditioning ste

arXiv.org e-Print Archive

CiteSeerX

Crossref

QPACE 2 and Domain Decomposition on the Intel Xeon Phi

Author: Arts Paul
Bloch Jacques
Georg Peter
Glaessle Benjamin
Heybrock Simon
Komatsubara Yu
Lohmayer Robert
Mages Simon
Mendl Bernhard
Meyer Nils
Parcianello Alessio
Pleiter Dirk
Rappl Florian
Rossi Mauro
Solbrig Stefan
Tecchiolli Giampietro
Wettig Tilo
Zanier Gianpaolo
Publication venue
Publication date: 01/01/2015
Field of study

We give an overview of QPACE 2, which is a custom-designed supercomputer based on Intel Xeon Phi processors, developed in a collaboration of Regensburg University and Eurotech. We give some general recommendations for how to write high-performance code for the Xeon Phi and then discuss our implementation of a domain-decomposition-based solver and present a number of benchmarks.Comment: plenary talk at Lattice 2014, to appear in the conference proceedings PoS(LATTICE2014), 15 pages, 9 figure

arXiv.org e-Print Archive

Juelich Shared Electronic Resources

Short-recurrence Krylov subspace methods for the overlap Dirac operator at nonzero chemical potential

Author: Andreas Frommer
Bloch
Bloch
Bloch
Bloch
Bloch
Bloch
Eiermann
Freund
Freund
Frommer
Frommer
Frommer
Ginsparg
Golub
Higham
Ingerman
Jacques C.R. Bloch
Jegerlehner
Katrin Schäfer
Kenney
Lüscher
Narayanan
Neuberger
Neuberger
Neuberger
Paige
Parlett
Saad
Simon Heybrock
Simoncini
Tilo Wettig
Tobias Breu
van den Eshof
van der Vorst
van der Vorst
Zolotarev
Publication venue: 'Elsevier BV'
Publication date: 14/05/2010
Field of study

The overlap operator in lattice QCD requires the computation of the sign function of a matrix, which is non-Hermitian in the presence of a quark chemical potential. In previous work we introduced an Arnoldi-based Krylov subspace approximation, which uses long recurrences. Even after the deflation of critical eigenvalues, the low efficiency of the method restricts its application to small lattices. Here we propose new short-recurrence methods which strongly enhance the efficiency of the computational method. Using rational approximations to the sign function we introduce two variants, based on the restarted Arnoldi process and on the two-sided Lanczos method, respectively, which become very efficient when combined with multishift solvers. Alternatively, in the variant based on the two-sided Lanczos method the sign function can be evaluated directly. We present numerical results which compare the efficiencies of a restarted Arnoldi-based method and the direct two-sided Lanczos approximation for various lattice sizes. We also show that our new methods gain substantially when combined with deflation.Comment: 14 pages, 4 figures; as published in Comput. Phys. Commun., modified data in Figs. 2,3 and 4 for improved implementation of FOM algorithm, extended discussion of the algorithmic cos

arXiv.org e-Print Archive

Crossref